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Abstract 

This paper presents two novel regularization methods motivated in part by the geometric significance of 
biorthogonal bases in signal processing applications. These methods, in particular, draw upon the structural relevance 
of orthogonality and biorthogonality principles and are presented from the perspectives of signal processing, convex 
programming, continuation methods and nonlinear projection operators. Each method is specifically endowed with 
either a homotopy or tuning parameter to facilitate tradeoff analysis between accuracy and numerical stability. An 
example involving a basis comprised of real exponential signals illustrates the utility of the proposed methods on an 
ill-conditioned inverse problem and the results are compared to standard regularization techniques from the signal 
processing literature. 
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1. Introduction 


Signal processing algorithms, notably those in machine learning and big data applications, make ubiquitous 
use of dense and sparse linear algebra subroutines [1], [2]. Numerical errors often arise while implementing 
such subroutines and are typically explained using perturbation analysis tools. For example, finite-precision effects 
such as coefficient quantization are quantifiable by invoking sensitivity theorems after organizing the computation 
into a linear or nonlinear signal-flow graph, as discussed in, e.g., [3]. Also, the magnification and compounding 
of propagated errors are commonly understood via the condition number of a functional representation of the 
processing system. These unavoidable sources of error contribute significantly to the total accumulated error in an 
obtained solution, especially as problem sizes continue to scale. Algorithms specifically designed to use orthogonality 
principles alleviate these issues and, among other reasons, receive extensive use in practice. Moreover, orthogonality 
in other contexts such as quantum measurement and asynchronous distributed optimization continue to provide 
inspiration for new signal processing algorithms, architectures, and applications [4], [5]. 

Signal processing techniques exploiting natural and efficient representations for signal models with inherent 
geometric structure often use non-orthogonal bases and frames that may generally suffer from conditioning issues, 
e.g. finite rate of innovation [6], compressed sensing [7], and transient signal analysis [8]- [11]. To address these 
issues, we restrict our attention in this paper to two classical approaches: orthogonalization and regularization. 
Traditional orthogonalization routines typically destroy the topological relevance of an original basis to the problem 
at hand while simultaneously weakening the informational significance of the solution obtained. The orthogonal 
Procrustes problem has been used for subspace whitening and machine learning in the matrix setting, i.e. generating 
the unitary matrix that optimally maps one matrix to another in the Frobenius norm sense [12], [13]. In this 
paper, we specifically translate and explicate these results using bases and further extend them as they pertain to 
biorthogonalization, nonlinear projections, and novel regularization methods. The proposed regularization methods 
permit natural characterizations of accuracy and stability enabling standard tradeoff analysis to identify an acceptable 
balance. Regularization methods in signal processing have historically focused on directly obtaining a solution biased 
toward prespecified signal properties. In this paper, however, we emphasize the effect the proposed methods have 
on the linear problems Euclidean structure itself. 

This paper is outlined as follows: In Section II we collect together geometric facts surrounding orthogonality and 
biorthogonality that will be referenced throughout. The orthogonalization and biorthogonalization of a linear system 
via a nonlinear projection operator is the central focus of Section III. Two regularization methods for balancing 
numerical robustness in the sense of relative conditioning and accuracy in the sense of geometric structure are 
developed in Section IV. The proposed methods are then applied to the regularization of an ill-conditioned basis 
composed of real exponential signals in Section V as a numerical example. Throughout the presentation, we draw 
no distinction between various linear problems and the mathematical relationships connecting them, e.g. matched 
filtering, solving linear systems, changing bases, etc. In this way, the insights and methods presented may be readily 
applied to any linear problem equally well. 

A. Notational conventions 

Vectors are denoted using underscores with subscripts indexing vectors as opposed to entries, i.e. and 02 
represent two distinct vectors. Boldface letters denote a system or collection of N vectors, typically constituting a 
basis for M^, i.e. a = A biorthogonal set is denoted using i.e. a is biorthogonal to a. The p-norm 

of a vector a for p > 1 is denoted ||a||p. A capital letter denotes the matrix form of a set of vectors, i.e. A 
has columns a. The Kronecker delta is denoted 6kj. For clarity we assume all systems are real-valued bases; 
rigorous extensions to complex-valued bases and frames follow analogously, e.g. by using appropriate conjugation 
and restricting arguments to the range of a system. 

II. The topology of the Stiefel manifold and 0{N) 

Let a and b denote two general systems and define cbo as the nonnegative measure of their biorthogonality 
given by 

eBO (a, ^) - 

k j 


( 1 ) 


which achieves a unique global minimum of zero provided that b = a. It is then straightforward to conclude that 


cbo (sl, a) = 0 <^=4> a is an orthonormal system. 


( 2 ) 


The Stiefel manifold Vk^N is the orthogonal groups principal homogenous space (k^N) and is written using (1) 
as [14] 


Vv,iv = eBo(h,h) =0}. 


( 3 ) 


Represented as a quadratic form, (1) is convex quadratic with positive signature, i.e. all eigenvalues are positive 
valued, thus any optimization problem that uses (1) as a cost function or (3) as a feasible set is not precluded from 
being a convex problem. 

The orthogonal group of dimension iV, denoted by 0{N) and described in set-builder notation using the matrix 
form of h as 

0{N) = {H e = In} (4) 

where /at is the identity operator on M^, consists of two connected components A4i and M-i. Furthermore, A4i 
and A4-1 each form a disjoint smooth manifold in characterized by 


Mi = {II e 0(N): det(H) = i} , i = ±1 (5) 

where det( ) denotes the determinant operator. Note that A4i is precisely the special orthogonal group SO{N). 
From (5) and the definition of a connected component it immediately follows that 

H e Mi^ H e Mi, i = ±1. (6) 


We conclude this section by noting that appropriately selecting a base point for Vn,n establishes a one-to-one 
correspondence with 0{N) and, therefore, justifies interchanging between the two notational descriptions of an 
orthogonal system, i.e. h and H. 


III. Orthogonalization and biorthogonalization 


Let a and b continue to denote two general systems and define cls as the nonnegative Euclidean measure of 
their distance given by 


eis (a, b) 


— Nl ll^fc ~ 

k 

= L/ -kklOLk- kk) 

k 


( 7 ) 

( 8 ) 


which achieves a unique global minimum of zero provided that b = a. In the sequel we shall denote by g a general 
prespecified system. 


A. Optimal orthogonalization problems 

Consider the optimization problem that identifies the nearest orthogonal system to g in the sense of (7), or 
equivalently the least-squares orthogonalization of g, given by 

hi = argmin els (g, h) s.t. ebo (h, h) = 0. (PI) 

h 

Similarly, consider the optimization problem that identifies the nearest orthogonal system to the biorthogonal system 
g in the sense of (7) given by 

£2 = argmincLs (g, h) s.t. ebo ( h , h) = 0. (P2) 

h 

We collectively refer to (PI) and (P2) as optimal orthogonalization problems and further interpret their solutions 
as orthogonal systems that maximally preserve the geometric structure inherent to g and g, respectively. 



Fig. 1. An illustration of the relationships between the matrix forms of g and g and their adjoints and their projections onto the orthogonal 
group via (13) where the projection operator is denoted using V{‘). The system g is chosen such that det(G) > 0 and therefore the projections 
are onto Mi. 


B. Optimal biorthogonalization problems 

Consider the optimization problem that identifies the orthogonal system which is maximally dual to g in the 
sense of (1) given by 

hs = arg min €30 (g, h) s.t. €30 (h, h) = 0. (P3) 

h 

Finally, consider the optimization problem that identifies the orthogonal system which is maximally dual to the 
biorthogonal system g in the sense of (1) given by 

h 4 = arg min €30 (g, h) s.t. €30 (h, h) = 0. (P4) 

h 

We collectively refer to (P3) and (P4) as optimal biorthogonalization problems and further interpret their solutions 
as the orthogcmal systems that, switching to the matrix form notation introduced in Subsection I-A, maximally 
invert G and G, respectively. 


C. Equivalence of cls cmd cbo over Vn^n 

The measures (1) and (7), respectively interpreted as optimal biorthogonalization and orthgonalization cost 
functions, produce generally unequal cost surfaces over the space of all W-dimensional bases. However, for fixed 
g and orthogonal h, it follows that 


eBo(g,h) 




^ V ^ 5 ^ 5 —k ) i—k 5 —k ) ^ 

k 


(9) 

( 10 ) 


^ 'Ekk ~ Ik - h) ( 11 ) 

k 

= 6L5(g,h) (12) 

where the restriction of h to Vn^n is used to obtain (10) from (9), in particular since (a, 6) = J2j{^^hij){hj^k) and 
^kj = {hk^hj)- 


D. Analytic solution and projection interpretation of (P1)-(P4) 

In Subsection III-C, we established that the solution to (PI) solves (P3) and likewise that the solution to (P2) 
solves (P4). In this subsection, we first state and interpret the analytic expression that minimizes (PI) followed 
by discussing one way (PI) and (P2) may be argued to have the same minimizer. Indeed, the matrix form of the 
orthogonal system hi which solves (PI) is given by 

= uv^ 


( 13 ) 







where G = is the Singular Value Decomposition (SVD) of the matrix form of g. Observe that verifying 

the orthogonality of (13) folllows immediately from the closure property of 0{N); a sketch of the full argument 
that (13) minimizes (PI) is deferred to the Appendix. Interpreting (13) as a nonlinear operator acting upon G, it is 
straightforward to verify idempotency, hence (13) is the projection of G onto 0{N). Figure 1 illustrates this, i.e. that 
the action of (13) on G, G and their transposes is a projection onto the orthogonal group. From the perspective of 
bases, generating (13) consists of N radial projections along the semiaxes of the ellipsoid described by the SVD 
of G such that the associated singular values become unity. 

An equivalence between (PI) and (P2) is established by using the matrix and basis representation equivalence 
discussed earlier where the matrix form of g, i.e. the linear functionals biorthogonal to g in the dual linear space, 
corresponds to the inverse of the adjoint of G. The desired result is obtained by straightforward manipulations that 
further simplify (P2), in particular by exploiting SVD properties. For arbitrary systems g, we comment that it is 
possible to identify a priori which manifold (13) belongs to, i.e. combining the manifold characterizations in (5) 
and using the connected component property in (6) we conclude that 

H ^ Mi ^ sgn (det(G)) = i, i = ±1. (14) 

E. Incremental projection methods for nearly orthogonal g 

Although many numerical algorithms exist to compute (13), we call special attention to the case of G being 
“nearly” orthogonal. Let R = G^G — /at be a Grammian residual. Re-writing (13) as 

# = G(/iv + i?)“^ (15) 


we proceed assuming the spectral norm of R is less than 1 which may be efficiently certified using, e.g. Gershgorin’s 
circle theorem [15]. After some straightforward manipulations we obtain the power series 

H^G 

and thus the solution to (P1)-(P4) may be generated to any desired level of accuracy without explicitly computing 
an SVD by truncating (16) to an appropriate number of terms. 



IV. Trading accuracy for numerical stability 

Many applications of signal processing as well as engineering more broadly require considerably less accuracy 
than what is available on modem computational platforms. By pairing systems in Vn^n with backward stable 
algorithms, computation can be made to not magnify propagated errors. Motivated by these observations, we 
next propose two regularization methods that facilitate accuracy and stability tradeoffs readily explained through 
homotopy maps and convex analysis, respectively. We comment upfront that while our treatment focuses on 
interpretation, the utility of these methods is in regularizing ill-conditioned problems while preserving some of 
the original systems Euclidean structure as is illustrated in Section V. 

Regularization methods are often described initially using the language of optimization theory; the methods 
surveyed in Subsection V-B serve as examples. Some regularization techniques, primarily those related to least 
squares formulations, admit solutions in closed-form. Setting these special cases aside, the remaining techniques 
conventionally solve directly for a solution to the problem at hand without first generating a regularized system. The 
methods proposed in this section specifically make this intermediary step, i.e. they explicitly generate a regularized 
linear system from which a solution is then generated, potentially making further use of any regularization techniques 
belonging to the class just mentioned in which the regularized system takes the place of the original system. 

A. Homotopic continuation formulation 

Motivated by the extensive use of floating-point arithmetic in big data computation, we measure the stability of 
a system g through its relative condition number, i.e. the ratio of extremal singular values of G. Loosely speaking, 
when this measure is on the order 10^ then only d — k digits of the computed solution are reliable where d is the 
maximum number available on the chosen computational platform [16]. This loss in accuracy is inherent to the 




system itself and says nothing about further degradations that accumulate due to a particular algorithms stability 
nor to errors that arise from round-off noise and coefficient quantization. 

We first introduce a continuation scheme to generate a parameterized family of regularized systems along the 
trajectory corresponding to the smooth deformation of g into hi. The utility of this scheme for solving a linear 
system of equations Gx = y, with known y, is to sequentially or approximately solve for x while substituting 
different systems along this continuum for G. The parameterization of the system mitigates numerical conditioning 
issues at the expense of accuracy, i.e. the continuation terminates when a system associated with an acceptable 
solution is identified. The function describing these deformations is referred to as a homotopy map. In this paper, 
we use the specific homotopy map 

h 5 (p, g) = [h ■ hk = + p^k where z solves (Pl)| (17) 

where the parameter p G [0,1] balances the previously addressed tradeoff. The family of linear systems achieved 
using the homotopy map is h 5 (p, g) gg for p G [0,1] and the standard implementation strategy is to track 
the solution to the problem using instead of G starting from (p, g) = ( 0 , g) as p progresses from 0 to p'^ where 
p'^ generates the system associated with an acceptable solution. In Section V, solutions to a similar problem are 
obtained by discretizing [0,1] and evaluating a performance metric for all values of p in this interval. Standard line 
search methods help to efficiently identify the optimal homotopy value p'^. 

Since the condition number of G is equal to the condition number of G, and with the equivalence between the 
minimizer of (PI) and (P2), we conclude the discussion in this subsection by noting that replacing g with g results 
in the same presentation with the geometric structure of the biorthogonal system g being preserved instead. 

B. Unconstrained optimization formulation 

Prompted by the exposition on characterizing numerical accuracy and stability in the previous subsection, we next 
formulate an optimization problem whose cost function is the sum of the metrics (7) and (1), i.e. each summand 
is respectively minimized by 115 ( 0 ,g) and 115 ( 1 ,g). We formally write this problem as an unconstrained quartic 
convex optimization problem of the form 

h 6 (p,g) = argmineL 5 (g,h) +peso(h,h) (18) 

h 

where p again determines the previously addressed tradeoff. Indeed, consistent with the discussion surrounding the 
positive definite nature of ( 1 ) represented as a quadratic form and the fact that the sum of unconstrained convex 
functions results in a convex function, it is straightforward to verify that any solution to (18) that is locally optimal is 
also globally so. Although the cost function of (18) is purposefully constructed using the endpoints of the continuum 
of regularized systems achievable using (17), it is readily possible to generate solutions to (18) that do not lie on 
this continuum nor are obtainable using (17) for any value of the homotopy parameter p. 

In response to the global optimality guarantees established above and for the sake of completeness, we conclude 
this section by including gradient primitives that may be used by any number of gradient-based nonlinear 
programming algorithms in solving (18): 

Vez.5(g,h) U =2 5^((5., - S^^k)g_^ (19) 

3 

VeBo(h,h) \h=4.{{h^,,hk)-l)hk + 2yy^{hj,hphj. (20) 

Note that an analytic expression for the system he which minimizes (18) is currently unknown due to the cubic nature 
of the first order optimality conditions, i.e. the nonlinear system of equations defined by setting the appropriately 
weighted sum of the two terms above equal to zero for each value of k. 


V. Numerical example 


Solving linear equations using regularization, a common approach to ill-posed problems, modifies the problem so 
that (i) the new problem is biased toward expected solutions and (ii) the previously mentioned numerical issues are 
reduced. In signal processing, regularization appears in many forms: Tikhonov regularization and Wiener filtering, 
total variation denoising in image processing, and basis pursuit denoising in compressive sensing. In this section, 
we use the methods developed in Section IV to solve a numerical example. 

A. Problem formulation 

Let E denote the matrix form of a real exponential basis e, i.e. E is Vandermonde and, for 0 < ai < • • • < 
aN < 1, generated by 

Eij = a^\ ( 21 ) 

The relative condition number of E, and consequently E too, grows exponentially in N [17]. This fact justifies 
using regularization to solve a linear problem of the form 

Ex = y (22) 

for X where y is analytically generated according to a synthetic solution x to help limit observed numerical effects 
to the regularization effect each method has on the matrix E in (22). 

B. Numerical methods for comparison 

This subsection briefly reviews three regularization methods from the signal processing literature, two of which 
originate in the compressive sensing setting, to compare against (17) and (18) in solving (22) [7]. The first method, 
Tikhonov regularization, is formulated as 

^ ^iixgmm\\Ex-y\\l + \\Tx\\l. (23) 

X — 

where we proceed to take the Tikhonov matrix T = plj^. The objective in (23) is convex quadratic, thus the 
solution has an analytic expression, namely of" = {E^E + p‘^I)~^E^y. Next, the basis pursuit denoising problem 
is formulated as 

E" = argmin \\Ex — y \\2 + p||^||i (24) 

X — 

where p balances the absolute size of the solution with the desired agreement of (22). Finally, the Dantzig selector 
is given by 

= &xgm\u\\E'^{Ex-y)\\oo + /9|kl|i (25) 

X — 

and may be solved using standard linear programming techniques. 

C. Numerical results 

Our experimental setup is as follows: We generate E on each trial by selecting = 18 values ak uniformly 
at random from the interval [0.1,0.9] and solve (22) using the various methods discussed. We select p to within 
10“^ for each method per trial to minimize the residual ||x^ — x|| 2 . The solution x'^ to (17) and (18) is found by 
directly solving the regularized equations via Gaussian elimination. Table 1 lists the average residual and value 
of p taken over 10^ trials. Solving (22) directly yields an average error of 8.64. The condition number of E for 
this experiment is lower bounded by 10^^. In summary, we remark that (17) outperforms the alternatives and (18) 
outperforms all but (23) for the Vandermonde linear systems tested. The larger error for (25) is due in part to the 
condition number of E^E being the square of the condition number of E. 

Figure 2 illustrates an accuracy versus numerical stability tradeoff curve for the proposed regularization methods 
where accuracy is given by the residual — x ||2 and numerical stability is given by the relative condition number. 
For both methods, it is clear that a non-zero value of p is optimal corresponding to a system whose conditioning is 
several orders of magnitude less than £”s. Figure 3 depicts a subset of regularized signals for the optimal selection 
of p in Figure 2. As depicted and consistent with elements not shown, the regularized signals have similar structure 



Fig. 2. Accuracy (residual error) versus numerical stability (relative condition number) tradeoff curves for (17) and (18). 






Coordinate number Coordinate number 

- Original real exponential signals 

- Real exponential signals regularized using (17) 

- Real exponential signals regularized using (18) 


Fig. 3. An illustration of a subset of the real exponential basis elements e and the corresponding elements from the proposed regularization 
methods. 


to e, i.e. they decay at similar rates to their exponential counterparts, also depicted for comparison. Signals produced 
by (18) retain a smooth nature with an additive offset and slight upward curvature and those produced by (17) 
contain small fluctuations that seem to alleviate conditioning issues. 


Table 1. Numerical comparison of regularization methods 


Solution reference 

(17) 

(18) 

(23) 

(24) 

(25) 

average \\x^ — x 2 
average optimal p 

0.096 

0.002 

0.153 

0.028 

0.139 

0.025 

0.147 

0.015 

13.46 

0.044 





































Appendix 


Sketch of the argument that (13) solves (PI): 

arg inin eLs(g,h) = arg inin '^{g, - hj,, g - hj.) 

heViv.N heyjv.N “ 

k 

= hiy« « ^ iSh'lk") + ^^Lk^k) - 2(5^, 4 fc)) 

k 

= arg nap {g,,h^) + --- + {gj^,hj^) 

heVN,N 

Switching to matrix notation for convenience, utilizing properties of a trace, and writing G using its SVD G 
we proceed as: 

arg max (q ., hu) arg max trace (VTiU^H) 

^HeO{N) ^ ^ 

= U ( arg max trace (Hi?) ) 

V HeO{N) 7 

= UV^ 
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